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Abstract 

Recently theoretical guarantees have been obtained for matrix completion in the non-uniform sampling 
regime. In particular, if the sampling distribution aligns with the underlying matrix’s leverage scores, then 
with high probability nuclear norm minimization will exactly recover the low rank matrix. In this article, 
we analyze the scenario in which the non-uniform sampling distribution may or may not not align with 
the underlying matrix’s leverage scores. Here we explore learning the parameters for weighted nuclear 
norm minimization in terms of the empirical sampling distribution. We provide a sufficiency condition for 
these learned weights which provide an exact recovery guarantee for weighted nuclear norm minimization. 
It has been established that a specific choice of weights in terms of the true sampling distribution not only 
allows for weighted nuclear norm minimization to exactly recover the low rank matrix, but also allows 
for a quantifiable relaxation in the exact recovery conditions. In this article we extend this quantifiable 
relaxation in exact recovery conditions for a specific choice of weights defined analogously in terms of 
the empirical distribution as opposed to the true sampling distribution. To accomplish this we employ 
a concentration of measure bound and a large deviation bound. We also present numerical evidence for 
the healthy robustness of the weighted nuclear norm minimization algorithm to the choice of empirically 
learned weights. These numerical experiments show that for a variety of easily computable empirical 
weights, weighted nuclear norm minimization outperforms unweighted nuclear norm minimization in the 
non-uniform sampling regime. 


1 Introduction 

Matrix completion has become one of the more active fields in signal processing, enjoying numerous appli¬ 
cations to data mining and machine learning tasks. The matrix completion problem is one where we are 
allowed to observe a small percentage of the entries in a data matrix M and from these known entries, we 
must infer the values of the remaining entries. This problem is severely ill-posed, particularly so in the high 
dimensional regime. To this end, one must typically assume some sort of low complexity prior on M , i.e. 
M is a low rank matrix or is well approximated by a low rank matrix. Using this hypothesis a wide range 
of theoretical guarantees have been established for matrix completion [1, 2, 3, 6, 8, 9, 11, 12], As noted in 
[4], these articles share a common thread that the recovery guarantees all require that: 

• The method of sampling the data matrix M must be done in a uniformly random fashion, 

• And that the low-rank matrix M must satisfy a so-called “incoherence” property, which roughly means 
that the distribution of the entries of the matrix must have some form of uniform regularity (thereby 
allowing the uniform sampling strategy to be effective). 

In [4] it is observed that although the aforementioned articles differ in optimization techniques, ranging from 
convex relaxation via nuclear norm minimization [2], non-convex alternating minimization [8] and iterative 
soft thresholding [1], all of these algorithms have exact recovery guarantees using as few as 0 (nr log ?r) 
observed elements for a square n x n matrix of rank-r. 

One of the central issues in matrix completion is the relationship between the distribution of a matrix’s 
entries and the sampling distribution being employed. For instance, if a matrix is highly incoherent, it has 
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much of its Frobenius norm energy spread throughout its entries in a relatively uniform fashion. To this end, 
taking a uniformly random sample of this matrix’s entries will be a sufficient enough representation to allow 
for exact recovery. However, if a matrix is highly coherent, in other words, it has much of its Frobenius norm 
concentrated in a relatively sparse number of its entries, intuitively we understand that a uniform sampling 
strategy will not yield a sufficiently representative sample to allow for exact recovery. 

Up until recently, the exact nature of this relationship between the M and the sampling distribution p 
has not been quantified beyond the uniform sampling case. In [4] we see this aforementioned relationship 
quantified. For the purposes and aims of this article, we focus on two particular results established in [4]: 

• If the sampling distribution p is proportional to the sum of the underlying matrix’s leverage scores , 
then any arbitrary n x n rank-r matrix can be recovered from 0 (nr log 2 n) observed entries with high 
probability. The exact recovery guarantee is for the nuclear norm minimization algorithm [13]. 

• Given a set of weights R, C, a sufficiency condition on the sampling distribution p is established. 
In particular, if the sampling distribution p is proportional to a sum of these i?, C weights, then 
exact recovery guarantees are derived for weighted nuclear norm minimization (the particular form 
of weighted nuclear norm minimization objective was first posed in [14, 5]). Moreover, the benefit of 
weighted nuclear norm minimization vs. unweighted nuclear norm minimization is quantified with a 
specific set of weights R, C which are chosen in terms of the sampling distribution p. 

We are primarily interested in the second result on weighted nuclear norm minimization. We will explore 
the nature of the relationship between the weights R, C and the empirical sampling distribution p as opposed 
to the true sampling distribution p. As previously noted, [4] established the efficacy of weights R, C chosen 
in a specific fashion in terms of the sampling distribution p. However, we are interested in a setting where 
the sampling distribution p is not known to us and no prior knowledge of p is available. In this article, we 
make the following contributions: 

1. We extend the sufficiency condition from [4] to the case when the weights R,C are functions of the 
empirical sampling distribution p for the exact recovery of M using weighted nuclear norm minimiza¬ 
tion. 

2. We show that a specific choice of weights R, C as functions of p produces a similar quantifiable 
relaxation in exact recovery conditions for weighted nuclear norm minimization vs. unweighted nuclear 
norm minimization. 

3. We numerically demonstrate the healthy robustness of the weighted nuclear norm minimization to 
the choice of the weights 2?, C, hearkening back to the previous work in non-uniform sampling and 
weighted matrix completion [14, 5]. We also demonstrate the superiority of weighted nuclear norm 
minimization over unweighted nuclear norm minimization in the non-uniform sampling regime. 

To obtain the above two theoretical guarantees we will use a large deviation and a concentration of measure 
bound from [7] to derive sufficient conditions as to when we may use the empirical sampling distribution p as 
an effective proxy for the true sampling distribution p. The remainder of the article is organized as follow: in 
Section 2 we state our main results, in Section 3 we develop all the empirical estimation guarantees required 
to establish the matrix completion guarantees, in Section 4 we establish our matrix completion guarantees 
and in Section 5 we present our numerical simulations. 

We use the notation that a A b := min(o, b) and a V b := max(a, b) throughout the article. 


2 Main Results 


Numerous matrix completion results [2, 3, 12, 13] have established the effectiveness of using nuclear norm 
minimization: 


min IIXII* subject to AT 

xei"i><"2 


Mij for (i,j) £ ft, 


( 1 ) 


as a method of performing matrix completion, or in general low rank matrix recovery tasks. However, all of 
these results may be classified as being in the uniform sampling regime. To this end, recently [4] established 
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that (1) can exactly recover an n x n square matrix M of rank-r from 0(?rr log 2 n) samples as long as the 
sampling distribution p and M's row and column leverage scores {pi(M), - =1 respectively, satisfies 

the following inequality: 


Pij > min 



jpi(M) + Vj(M))r log 2 (2n) 
n 



for all (i,j) G [n\ x [n], 


( 2 ) 


for some universal constant Co- With (2) the quantitative nature between the degree of non-uniformity of the 
sampling distribution p and the corresponding coherence statistics of the matrix M has been established. 

Consider now a different scenario, one in which the sampling distribution p and the underlying matrix’s 
leverage scores {pi(M)}^ 1 , {vj (JW)}" ; i 1 do not align according to (2). One technique to remedy this 
situation is to design a transformation M K > M so that we may adjust the leverage scores to align with 
the sampling distribution p. Following [5, 14] we choose weights of the form R := diag(i?i,..., R ni ) G 
R”iX”i, C := diag(Ci,..., C n2 ) G R n2Xn2 . Using these parameterized weights, we will use M > RMC as 
our transformation which will adjust leverage scores of M. In [14] a weighted nuclear norm objective was 
proposed. Following [5, 4], we will be considering the following weighted nuclear norm optimization problem: 

M = argrnin ||1?XC||* subject to Xij = Mij, for (i,j) G LI. (3) 

xen><"2 


In [4] exact recovery guarantees for (3) were established for weights JR, C which were defined in terms of the 
true sampling distribution p, which we state for the square n x n case: 

Theorem 2.1. Let M = (M^) be an n x n matrix of rank-r, and suppose that its elements M t j are observed 
only over a subset of elements LI C [n] x [n]. Without loss of generality, assume R\ < i ?2 < • ■ • < R n and 
C\ < C 2 < • • • < C n . There exists a universal constant cq such that M is the unique optimum to (3) with 
probability at least 1 — 5(2n) -10 provided that for all ( i,j ) € [n] x [n],pij > n~ 10 and: 


Pij X Co 


Rf 


YU(™/(MoUJ r >2 

\Ei'=1 Rv 


E 


l(n/{uor)\ 
j ‘=1 



log 2 (2n). 


(4) 


Note that for monotonically increasing weights R,C the corresponding support sets S r ,S c are merely 
the first | n/(por)\ indices, respectively. 

For the remainder of the article, we shall assume that our sampling distribution p has a product form 
Pij = PiPj for all (■ i,j ) € [ni] x [n 2 ]. Furthermore, we will consider the following two-stage sampling model : 

• Stage 1 (Empirical Sampling Distribution): We sample the distribution p with m times independently 
with replacement, but the corresponding entries of the data matrix M are not revealed to us. In other 
words, we are sampling the sampling distribution , but not the underlying matrix M. 

• Stage 2 (Sampling the Matrix): We then, independent of the first stage, sample the matrix M using 
the independent Bernoulli model for each entry (i,j) G [ni] x [ 712 ]. 


Note that this two stage sampling models allows one to sample the sampling distribution p without revealing 
the entries of M. In this manner we may design weights R, C which depend on the empirical sampling 
distribution p and obtain matrix completion guarantees for these weights in the usual (stage two) independent 
Bernoulli sampling model that has been typically used in the matrix completion literature. 

In this article we present stage one sampling bounds which will allow p to be used as an empirical proxy 
for p to design weights R, C for (3) and obtain exact recovery with high probability. To this end, we establish 
the following two empirical estimation lemmas, which will serve as the foundation to our matrix completion 
guarantees. The first is a one sided large deviation bound: 


Lemma 2.2. Let p denote a probability mass function on [ni] x [ 712 ] and suppose p has a product form, 
i.e. for all ( i,j ) G [ni] x [ 712 ] : p^ = p^p 1 ^ for p r ,p c probability mass functions on [ni], [ 712 ], respectively. 

Let Xi ,..., X m p be a sequence of m i.i.d samples. For any a G (0, (min, e [ ni ] p'j V minj £ [„ 2 ] P^E 1 ) and 
e G (0,1), if the number of samples m is chosen such that: 


m = 


1 

2 


( a min p\ A min p? ) log(e 1 {n\ + 712 )), 
V *e[ni] je[n 2] ) 


(5) 
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(6) 


then with probability at least 1 — e we have that for all ( i,j) £ [ni] x [n^]: 

Pij ~ (ITap^- 

We also establish the following two sided empirical bound for the estimation of product distributions: 

Lemma 2.3. Let p denote a probability mass function on [ni] x [ 712 ] and suppose p has a product form, 
i.e. for all ( i,j ) € [ni] x [ 712 ] : Pij = p^p) for p r ,p c probability mass functions on [ni], [ 712 ], respectively. 

Let X ±,..., X m p be a sequence of m i.i.d samples. For any a £ (0, (min, e [ ni ] p( V minj e [„ 2 ] p )) _1 ) and 
e € (0,1), if the number of samples m is chosen such that: 


m = 


1 

2 


( a min p( A min p c i I log(2e 1 (m + ti 2 )), 
\ *S[ni] j&[n 2 ] ) 


(7) 


then with probability at least 1 — e we have that for all ( i,j ) € [ni] x [772].' 

(1 + a y Pij ~ PlJ - (l-a) 2 ^’ 


( 8 ) 


Note that Lemmas 2.2 and 2.3 are general results for the empirical estimation of any distribution p over 
[rii] x [ 712 ] which has a product form. Recall that the sampling model employed in [4] is a sequence of n\ ■ 712 
independent Bernoulli random variables, with each Bernoulli random variable having success probability p t] 
for (i,j) € [ni\ x [ti 2 ]. Therefore, p may not be a probability matrix on [ni] x [772] as it may not sum to 1 . 
To this end, we note that when we sample p, we are really sampling the normalized matrix ^ 1 l — p. So our 

empirical estimator p is estimating the normalized probability matrix ^ 1 p and not p itself. Therefore, 

in order to apply the above lemmas we must account for this normalization constant. 

Using the above, we will obtain two weighted matrix completion guarantees. For simplicity, we will prove 
all our results for the case when M is a square n x n matrix. The first guarantee will be a sufficiency 
condition for the weights R, C in terms of the empirical estimator p which will ensure exact recovery by 
weighted nuclear norm minimization with high probability: 

Theorem 2.4. Let M = (M,y) be an n x n matrix of rank-r, and suppose that its elements are observed 
only over a subset of elements fl C [ti] x [?r] ; Let e € (0,1) be arbitrary. Suppose that there exists an 
a € (0, (min ie [„]p[/(JT e j ra jpt) V min je [„] Pj/(X^e[n] Pj)) 1 ) an d some universal constant c 0 such that for 
all indices ( i,j ) € [rz] x [?z] the weights R,C satisfy the following inequalities: 


- . (l + «) 2 

Pij > ^ c 0 

2-^ij rlf) 


Rf 




+ 


c? 




log 2 (2?i), 


(9) 


where S r ,S c denote the |yi/(po r )J entries of least magnitude of R,C, respectively, 
one samples m is chosen such that: 

/ \ -2 


a mm 

i6[nl 


Pi 

W" n r 

2-^i —1 Pi 


A min 
j'e[n] 


n 


log(2e 1 n) 


If the number of stage 


and if for all ( i,j ) £ [n] x [n\,Pij > n 10 , then with probability at least (1 — 5(2ti) 10 )(1 — e), M is unique 
optimum to (3), where Ll is obtained via the usual (stage two) independent, entry-wise Bernoulli sampling 

of M. 

Our second weighted matrix completion guarantee will be for the exact recovery properties of a set 
weights R, C explicitly defined in terms of the empirical distribution p: 


Theorem 2.5. Let M be a square nx n rank-r matrix with coherence po- Consider the weights defined by: 


Ri — 



^2 Pj, for i = l,. 

j'&Sa 


,n, 


( 10 ) 
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c i = 4 /-Pj E Pi' /or j = 1,.. •, n, 


where S r ,S c denote the [n/(por)\ entries ofp r ,p c of least magnitude, respectively. Suppose that there 
exists an a £ (0, (min i£[n]Pi / C}2ie[n]Pi) V m i n je[n] Pj/(J2je[n] Pj))~> suc h that the (unnormalized) matrix 
p satisfies for all ( i,j ) € [n] x [n] and f/ie sets S*, S( which denote the |_ n/(por)\ entries of p r ,p c of least 
magnitude, respectively satisfies the following: 


Pj E Pi' ^ c ° 2 (i-a) 2 log2(2n) 


' (1 - a) 2 b v 
2(1 + a ) 2 


Pi E Pf ^ c o (1 _ a)2 lo S 2 (2n). 


If the number of stage one samples m is chosen such that: 


1 ■ Pi A • p J 

m = -\a mm -- A mm -- 

2\ l£ WEi=iPi 3£WL I= 1 Pj 


log(4e 1 n), 


then with pi'obability at least (1 — 5(2n) 10 )(1 — e), M is unique optimum to (3), where fl is obtained via the 
usual (stage two) independent, entry-wise Bernoulli sampling of M. 


Note: Unweighted nuclear norm minimization attains exact recovery under the condition that for all 
(i,j) e [n] x [n]: 

P r iP C j > — log 2 (2?r). (14) 


However as Theorem 2.5 establishes, weighted nuclear norm minimization with choice of weights (10) and 
(11) attains exact recovery subject to the less restrictive sufficient recovery condition that: 

V) E Pi' Z lo g 2 (2")> 

i'£«S* 

Pi E Py ~ lo § 2 (2n). 

yes* 

This is precisely the condition from [4]. 

3 Empirical Estimation 

We consider probability mass functions p on [m] x [ 712 ] which have a product form pij = pip) for (i,j) € 

[ni] x [ 112 ]- We will sample this distribution with replacement m times. The X\,... ,X m p samples 
are row and column pairs, i.e. X( € [ni] x [n^] for each k = 1,... ,m. We may define the row and column 
empirical estimators: 

Definition 3.1. The row and column empirical estimators p r ,p c , respectively are defined as: 


1 m 

pi ■= — E^ r ( Xfc )’ for * G 

m ti 

1 m 

P°j ■= — E 3 j( X *>)’ for / G M, 


where for any X k : 


S((X k ) = 


1 if X k is from row i, 
0 otherwise. 
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S-(X k ) 


1 if X k is from column j, 
0 otherwise. 


For the remainder of the article, we will allow p denote the empirical product estimate, i.e. p = p r p c . 
Observe that in (15) and (16) each component of our row and column empirical estimators involve 
a sum of independent, bounded in [0,1] random variables as 5\(X k ), Sj(X k ) € {0,1} for any ( i,j,k ) G 
[ni] x [ 77 - 2 ] x [m]. In this situation, we may use Hoeffding’s inequalities [7] to obtain some probabilistic 
approximation guarantees. For our purposes, we will be using two forms of Hoeffding’s inequalities: a one 
sided large deviation bound and a two sided concentration of measure bound. 

Theorem 3.2. (Hoeffding Inequalities) Let Z\,..., Z m be independent random variables such that each 
Zi € [a,;, bj\ with probability 1. Let S m = YiLi Then for any t > 0 we have: 


PrfSVn - E[S m ] > t] < exp - 


2 t 2 


YZ i(6 i-Oi) s 


(17) 


Pr[|S m - E[S m ]| >t]< 2exp (- — \ . (18) 

For any i G [ni], we may define m random variables Z.^ k := 5l(X k ) for k = 1,... ,m. Note that each 
random variable Z l k only takes values in {0,1} and thus is bounded in [0,1] with probability 1. As each X k 
is merely a row and column index, and each S) are row and column indicator functions, we have that any 
set of the s (and similarly for the column case) is an independent set of random variables. Therefore the 
hypotheses of Theorem 3.2 are satisfied. For each i G [ni] we may define the sum S[ m := YT= 1 Each 

S r i rn has expected value E[,S',[ m ] = mp). Analogous results hold for the column case. With the above pair 
of Hoeffding inequalities in hand, we are now ready to establish our main lemmas. For the proof of Lemma 
2.2 we will apply (17) and for the proof of Lemma 2.3 we will apply (18). 


3.1 Proof Lemma 2.2 


Proof. We start our proof by analyzing empirical estimation of the row distribution; the analysis for the 
column distribution will be identical. For any i G [ni],a > 0, choosing t = amin jg [ ni ip.', by (17) we have 
that: 

PiM — Pi > a min pf] < exp(-2(a min »}) 2 m). (19) 

ij 

We may repeat the analysis for the column case, where we choose t = amin j6 [„ 2 ] pf then analogously: 

Pr[p} —p C A>a min p c A < exp(—2 (a min p}) 2 m). (20) 

ie["2] J je[n 2 ] J 

For any i G [?7i] let El denote the event that pi — pi > a minand for any j G [ 772 ] let Ef denote the 
event that p) — p) > a minj £ [„ 2 ] p)■ 

We must choose a > 0 such that the bounds in (19), (20) are nontrivial. In particular, any two probability 
vectors cannot have their components differ by more than 1. Therefore, we require that a satisfies: 


a min pi < 1 and a min pi < 1. 

*e[m] jebH 

To this end it suffices to choose a G (0, (min ie [ ni ]p} V min je [„ 2 ] p}) _1 ). 

By (19), (20) and the Union Bound we have that: 

Pr [For some (i,j) the event Ef or E) occurs] < 771 exp(— 2{a min p}) 2 m) + n 2 exp(—2 (a min p}) 2 777. I 

V ie[ni] je[n 2 ] J 

< (ni + 712 ) exp(—2(cc min A min pl) 2 m). (21) 

iS[rai] je[n 2 ] 

Observe that (21) immediately yields that with probability at least 1 — (771 + 772 ) exp(—2(a min, g [ ni ] pi A 
min je[n 2 ]Pj) 2 m) for any (i,j) G [? 7 i] x [772] we have that the following bounds hold: 


Pi -Pi < a mm pi, 
*e[ra 1] 


( 22 ) 
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(23) 


P C i ~ P C n < o. min p C A. 

je[n 2 ] 


Therefore with probability at least 1 — (ni + 77 . 2 ) exp(— 2(a minp,j) 2 m) we may conclude that for all (i,j) € 
[ni] x [n 2 ] the following bound is true: 


Pij ~ (1 + a) 2 ^' 


(24) 


For any e € (0,1) choosing m such that: 

log((m + n 2 )e _1 ) 

TO = - 2 ) 

2 (a min.jg [ni] p[ A min j6 [n2 j p^) 

guarantees that (24) holds with probability at least (1 — e) and the proof is complete. 


(25) 

□ 


3.2 Proof of Lemma 2.3 

Proof. The proof of Lemma 2.3 is similar to the previous proof but we include the full proof for com¬ 
pleteness. We start our proof by analyzing empirical estimation of the row distribution; the analysis for 
the column distribution will be identical. Following the previous section we restrict ourselves to choose 
a € (0, (min ie [ ni ]pt v min je[n 2 ]Pj)~ 1 )- For any i £ [ni] choosing t = amin ie [ ni ] p[, by (18) we have that: 

Pr[|p[ ~ Pi I > a m i n Pi] < 2exp(—2(a min p[) 2 m). (26) 

ie[ni] ie[ni] 

We may repeat the analysis for the column case, where we choose t = ccmin j^[ n2 ]Pj, then analogously: 

Pr[|p5 — p5| > ol min p c ,] < 2exp(—2(a min p^) 2 m). (27) 

je[n 2 ] je[n 2 ] 


For any i £ [?7i] let E\ denote the event that \p[ — p\ | > a min i^[ ni ]Pi and for any j £ [n 2 ] let Ef denote 
the event that |pj — p C j\ > amin^^] p°. By (26), (27) and the Union Bound we have that: 


Pr [For some (i,j) the event Ef or E ° occurs] < 2 ?7i exp(— 2(a min pl) 2 m ) + n 2 exp(—2(a min p C j) 2 m I 

V *e[m] ie[n 2 ] J 


< 2(ni + ?i 2 ) exp(—2(a min pi A min p°) 2 m). 


*€[ru] 


je[n 2 ] 


(28) 


Observe that (28) immediately yields that with probability at least 1 — 2 (ni + n 2 ) exp(—2(ct minj g r ni ] pi A 
min j£[n 2 ]Pj) 2 m) for any (i,j) € [n\] x [?r 2 ] we have that the two following bounds hold: 


\Pi ~Pi\< a min pi , 
ie[ni\ 

\p c j -Pj\< a nun P?. 


The bound (29) is equivalent to the following: 


—a min pi < pi — pi < a min pi , 

i£[ni] i£[ni] 


and the above inequality yields that for any i £ [m]: 


1 + a 


Pl<Pl< 


1 — a 


Pi- 


Similarly (30) implies that for any j £ [n 2 ]: 


1 

1 T ex 




(29) 

(30) 


(31) 


(32) 
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Combining (31) and (32), we have that with probability at least 1 — 2(ni + ri 2 ) exp(—2(amin ie [„ 1 ] p\ A 


min je[n 2 \PjF m ) that: 


(1 + a) 


1 1 

P<P< 7^P- 


(1 — a) 5 


For any e G (0,1) note that if we choose: 


m = 


log(2(m +n 2 )e x ) 


2(a min i6[ni] p r z A min je[ „ 2] pfj 1 ’ 
then (33) holds with probability at least 1 — e and the proof is complete. 


(33) 

(34) 

□ 


4 Matrix Completion Guarantees 

With Lemma 2.2 in hand, we are prepared to prove Theorem 2.4 in Section 4.1. In Section 4.2, using Lemma 
2.3 we will prove Theorem 2.5 which quantifies the relaxation of the condition for which (3) succeeds in 
obtaining exact recovery using the empirically learned weights when compared to unweighted nuclear norm 
minimization. 

4.1 Proof of Theorem 2.4 

Proof. For any a G (0, (min ie[n] p^/CELi Pi) v m ^n je[n ] ^/(E"=i Pj)) -1 ) and e G (0,1) if we choose 


1 / • P r 

m = - a mm - 

2 l i€[n]£?=ltf 


■ 3 ' 

-2 


a ' P 0 

A mm - 

ieW E"=i P c i 


log(2e l n) 


by Lemma 2.2 we have that with probability at least (1 — e) for any (i,j) G [n] x [n]: 

Pij - 1 


> 


E a Pij (l + a) 


,Pir 


Observe that if the weights R, C satisfy (9) for a, we have that: 


Pij > 


Eii Pij 


(1 +a) : 


> c 0 


• Pij 


Rf 




, Hi'es r Cj' 


log 2 (2 n). 


(35) 

(36) 

(37) 


By Theorem 2.1 (37) is sufficient to guarantee exact recovery of M via (3) with probability at least 1 — 
5(2n) -10 . As stage one and stage two sampling are independent, we conclude that (3) attains exact recovery 
with probability at least (1 — 5(2?i) -10 )(l — e). □ 

4.2 Weighted Nuclear Norm and Relaxation of Sufficient Recovery Conditions 

With Theorem 2.4 we established some sufficient conditions for the weights R , C in order for (3) to attain 
exact recovery. In this section we will establish exact recovery guarantees for a specific set of weights defined 
in terms of the empirical sampling distribution p and quantify how the exact recovery conditions for (3) are 
relaxed relative to unweighted nuclear norm minimization (1). 

4.2.1 Proof of Theorem 2.5 

Proof. Choosing the weights R , C as in (10) and (11), observe that for any (i,j) G [n] x [n]: 


Rf 


C ? 


, Ei'eS,. Rf' EyeS c @j' 


log 2 (2 n) = 


'Pi Eyege Pj’ + P°j E j'eSr & 

. E i'j'CSr&Pi'Pj' 


log 2 (2 n) 


(38) 



Let a € (0, (min ie [„] p\ V min jS [ n2 ] pj) *) be such that (12) and (13) hold and let e € (0,1) be arbitrary. 
By Lemma 2.3 choosing m such that: 


1 ■ Pi A • 

to = - a mm -A mm - 

2 l ieMEILi Pi Mn] EU p 


-2 


log(4e i n) 


guarantees that with probability at least (1 — e) that for all indices (■ i,j) € [n] x [n]: 


1 

(l + «) 


, Pij 1 

2Pij — — A, \2 Pi i' 


(39) 


Applying (39) to (38) we have that for any (i,j) € [n] x [n]: 


log2(2n) = ( 

\ 2-^i',j'£Sr,S c Pi'Pj' 


log 2 (2 n) 



P r i Ej'es e Pj' +PjEi'es r Pi‘ 

Ei' 


-i’J'es r ,s e Pi'Pj’ 


Pi log 2 (2n) p C j log 2 (2n) 




i' 


p\ log 2 (2n) Pj log 2 (2n) 

Ei' G5 ; ^ + Eyes; Pj’ 


log 2 (2 n) 


(40) 

(41) 


where (40) follows as the sets S*,S* serve as a lower bound for the terms E,/ eS r Pi'’Ej ’eScPy respectively 
and thus inverting they serve as an upper bound and (41) follows from (12) and (13). Again by Theorem 
2.1 we immediately see that (41) is sufficient to guarantee exact recovery of M via (3) with probability at 
least 1 - 5(2n)- 10 . □ 


5 Numerical Experiments 

Here we test the performance of weighted nuclear norm minimization using various weights. We have the 
following experimental setup: the data matrix M is a unit Frobenius norm standard normal Gaussian square 
matrix of dimension n = 500. Our sampling distribution p = p r p c where p r ,p c are power law distributed 
with exponent equal to 1.2. Sampling the distribution p at a rate of m times with replacement and we obtain 
the empirical distribution p = p r p c . Using this empirical distribution p we test nuclear norm minimization 
using the following weights, as was done in [5]: 

1. Unweighted (Uniform Weights): the weights R , C are equal to the uniform weights. 

2. True Weighted: the weights R, C satisfy: R = ( p r ) 1 / 2 , C = ( p c ) 1 / 2 . 

3. Empirically Weighted: the weights R , C satisfy: R = ( p r ) 1 ^ 2 , C = (p c ) 1 ^ 2 . 

4. Empirically Smoothed Weights: the weights R, C are a linear combination of the empirical weights 
and the uniform weights. Letting l n := [1,..., 1] be a vector of length n whose coordinates are all 
equal to 1, we set R = yy(p r ) 1,/2 + ^1 n and C = ^(p 13 ) 1 ^ 2 + ^1 n , i.e. we put half of the mass on 
the empirical distribution and remaining half of the mass on the uniform weights. 

We let the rank of M be 5, 10, 15, 20, 25 and we choose a range of variable sampling rates. For each rank 
and sampling rate test configuration we performed 100 trials. We consider exact recovery to be when the 
output of the weighted nuclear norm M satisfies: ||M — M\\f < 10 -5 . To execute the weighted nuclear 
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Figure 1: Probability of Exact Recovery when 
the rank is equal to 5. 


Figure 2: Probability of Exact Recovery when 
the rank is equal to 10. 



Figure 3: Probability of Exact Recovery when 
the rank is equal to 15. 



Figure 4: Probability of Exact Recovery when 
the rank is equal to 20. 



Figure 5: Probability of Exact Recovery when 
the rank is equal to 25. 
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Figure 6: Power Law Sampling with replacement 
rate vs. Percentage of Unique Samples Revealed. 



norm minimization program we utilized the Augmented Lagrangian Method [10]. We obtained the following 
phase transition diagrams in Figures 1-5. 

Note that we do not perform the two stage sampling method. As the power law sampling distribution 
p is non-uniform, even though we may sample at a rate of m = 0 {n\ri 2 )i the rate that the percentage of 
unique revealed entries of M grows is in line with the uniform sampling regime we are accustomed to. In 
Figure 6 we show how with the independent sampling with replacement rate to grows with the percentage 
of unique entries of M. 


10 






















6 Conclusion 


In this article we extended numerous weighted nuclear norm minimization results from [4]. In particular 
we extended results where the weights were being defined in relation to the true sampling distribution p to 
the weights being defined in relation to the empirical sampling distribution p. Furthermore, we defined an 
empirical set of weights and established a quantifiable relaxation of exact recovery conditions for weighted 
nuclear norm minimization when compared to the unweighted nuclear norm. To achieve these guarantees we 
used a large deviation bound and a concentration of measure inequality from [7]. We showed that weighted 
nuclear norm minimization is quite robust to the choice of empirically learned weights. Indeed, we used a 
broad range of empirical weights and saw strikingly similar exact recovery gains over unweighted nuclear 
norm minimization. 
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